Hash Tree Indexing for Fast SPARQL Query in Large Scale RDF Data Management Systems

نویسندگان

  • Wenwen Li
  • Bingyi Zhang
  • Guozheng Rao
  • Renhai Chen
  • Zhiyong Feng
چکیده

Abstract. In the past decade, the volume of RDF (Resource Description Framework, which is a standard model for data interchange on the Web) data has grown enormously, and many RDF datasets (e.g., Wikipedia) have reached up to billions of triples. As a result, efficient management of this huge RDF data has become a tremedous challenge. In this paper, we present HTStore, a hash tree based system for fast storing and accessing large scale RDF data. The design of HTStore has three salient features. First, the compact design can effectively reduce the size of the indexes. Second, HTStore utilizes the hash function to significantly reduce the query time. Third, the proposed hash tree structure can easily adapt to the changes in data volume (e.g., data expansion). The experimental results demonstrate that the proposed system can improve the query efficiency up to 21.3% compared with the representative RDF data management systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Efficient SPARQL Query Processing on RDF Data

Efficient support for querying large-scale RDF triples plays an important role in Semantic Web data management. This paper proposes an efficient RDF query engine to evaluate SPARQL queries, where the inverted index structure is employed for indexing RDF triples. We first design and implement a set of operators on the inverted index for query optimization and evaluation. Then we propose a main-t...

متن کامل

SPARQL in the cloud using Rya

SPARQL is the standard query language for Resource Description Framework (RDF) data. RDF was designed with the initial goal of developing metadata for the Internet. While the number and the size of the generated RDF datasets are continually increasing, most of today’s best RDF storage solutions are confined to a single node. Working on a single node has significant scalability issues, especiall...

متن کامل

RDF-3X: a RISC-style engine for RDF

RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper p...

متن کامل

A role-free approach to indexing large RDF data sets in secondary memory for efficient SPARQL evaluation

Massive RDF data sets are becoming commonplace. RDF data is typically generated in social semantic domains (such as personal information management [2, 11, 13]) wherein a fixed schema is often not available a priori. We propose a simple Three-way Triple Tree (TripleT) secondary-memory indexing technique to facilitate efficient SPARQL query evaluation on such data sets. The novelty of TripleT is...

متن کامل

String-Based Semantic Web Data Management Using Ternary B-Trees

The Resource Description Framework (RDF) stems from the Semantic Web but can also be regarded simply as a data model, independent of its origins. Its simple structure is ideal for describing and merging heterogeneous data from different sources quickly, without having to design a complex schema first. The different nature of RDF requires new approaches for data management and query processing a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017